The Human Language Project: Building a Universal Corpus of the World's Languages

نویسندگان

  • Steven P. Abney
  • Steven Bird
چکیده

We present a grand challenge to build a corpus that will include all of the world’s languages, in a consistent structure that permits large-scale cross-linguistic processing, enabling the study of universal linguistics. The focal data types, bilingual texts and lexicons, relate each language to one of a set of reference languages. We propose that the ability to train systems to translate into and out of a given language be the yardstick for determining when we have successfully captured a language. We call on the computational linguistics community to begin work on this Universal Corpus, pursuing the many strands of activity described here, as their contribution to the global effort to document the world’s linguistic heritage before more languages fall silent.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

Language and the Socio-Cultural Worlds of Those Who Use it: A Case of Vague Expressions

 The present study is an attempt to investigate the use of vague expressions by intermediate EFL learners. More specifically, the current study focuses on the structures and functions of one of the most common categories of vague language, i.e. general extenders. The data include a 22-hour corpus of English-as-a-foreign-language conversations. A comparison is also made between this corpus and a...

متن کامل

A Cognitive Study of Conceptual Metaphors in English and Persian: Universal or Culture-Specific?

In the last 2 decades, studies on conceptual metaphors have profoundly increased. The development in this field was followed by Lakoff and Johnson's (1980b) work on describing the conceptual role played by metaphors and their correspondence with language and thought. This study aimed to compare conceptual metaphors in Persian and English through a corpus-based approach as well as examining both...

متن کامل

Universal Grammar and Chaos/Complexity Theory: Where Do They Meet And Where Do They Cross?

  Abstract The present study begins by sketching "Chaos/Complexity Theory" (C/CT) and its applica-tion to the nature of language and language acquisition. Then, the theory of "Universal Grammar" (UG) is explicated with an eye to C/CT. Firstly, it is revealed that CCT may or may not be allied with a theory of language acquisition that takes UG as the initial state of language acquisition for ...

متن کامل

Reverse Addressing in Modern Persian

Reverse addressing is an interesting realization of kinship terms in interactive, face to face communication. This descriptive study was proposed to examine the use of family address pronouns in Iran as a function of the classical sociological parameters of age, sex, and social distance. It investigated various aspects of reverse addressing as a vernacular phenomenon. Data were reported from th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010